View on GitHub

csc263

Notes for CSC263 - Data Structures and Analysis

Back to index

Hash Tables

Another implementation for the dictionary ADT, but has (under suitable assumptions) constant time operations

Relevant sets

Naive Implementation - Direct Addressing

Have an array S of size $\vert U\vert $, then for a key $k \in \lbrace 0, …, \vert U\vert - 1 \rbrace$, store its value in S[k], or if that key is not present, store a flag in S[k] marking it empty

e.g. if $\vert U\vert = 4$ and we insert(2, 1), then arr = [-, -, 1, -]

But what if $\vert U\vert $ is too big?

Hash Functions

A hash function maps a large set of inputs to a small set of things to use as keys

If we have an array S of size $m$, we can use a hash function $h : U \to \lbrace 0, …, m - 1 \rbrace$ to “hash” keys into places in S


Don’t worry about actual hash functions, they are mysterious.

“People go to Hogwarts for years to learn how to do this sort of thing.”

​ - Danny Heap, on proving the effectiveness of hash functions


Collisions

If $h : U \to K$ and $\vert K\vert < \vert U\vert $, then $h$ cannot be injective, so there could be a collision: two different keys share the same hash

Solutions:

Chaining

SUHA - Simple Uniform Hashing Assumption

We assume that $h(k)$ is equally likely to take on all values in $\lbrace 0, …, m-1 \rbrace$

Application to chaining

SUHA is equivalent to assuming that $\forall i, j \in \lbrace 0, …, m - 1 \rbrace, n_i = n_j$ where $n_x$ is the number of keys that are hashed to $x$

$\displaystyle \sum_{i = 0}^{m - 1} E[n_i] = n$, and by SUHA all $n_i \approx n_j$, so each $\displaystyle n_i \approx \frac{n}{m}$ (we call $\alpha = n / m$ the load factor)

Performance is bad if $\alpha$ gets too big (as $n$ grows), but we can improve this by expanding the hash table (increasing $m$)